Quantifying the Effects of Correlated Covariates on Variable Importance Estimates from Random Forests

نویسندگان

  • Ryan Vincent Kimes
  • RYAN VINCENT KIMES
چکیده

QUANTIFYING THE EFFECTS OF CORRELATED COVARIATES ON VARIABLE IMPORTANCE ESTIMATES FROM RANDOM FORESTS By Ryan Vincent Kinies A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science at Virginia Commonwealth University. Virginia Commonwealth University, 2006 Major Director: Kellie J. Archer, Ph.D. Assistant Professor, Department of Biostatistics Recent advances in computing technology have lead to the development of algorithmic modeling techniques. These methods can be used to analyze data which are difficult to analyze using traditional statistical models. This study examined the effectiveness of variable importance estimates from the random forest algorithm in identifying the true predictor among a large number of candidate predictors. A simulation study was conducted using twenty different levels of association among the independent variables and seven different levels of association between the true predictor and the

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Survival Forests in Analyzing First Birth Interval

Background and objectives: Application of statistical machine learning methods such as ensemble based approaches in survival analysis has been received considerable interest over the past decades in time-to-event data sets. One of these practical methods is survival forests which have been developed in a variety of contexts due to their high precision, non-parametric and non-linear nature. This...

متن کامل

The Performance of small samples in quantifying structure central Zagros forests utilizing the indexes based on the nearest neighbors

Abstract Todaychr('39')s forest structure issue has converted to one of the main ecological debates in forest science. Determination of forest structure characteristics is necessary to investigate stands changing process, for silviculture interventions and revival operations planning. In order to investigate structure of the part of Ghale-Gol forests in Khorramabad, a set of indices such as Cla...

متن کامل

Beta - Binomial and Ordinal Joint Model with Random Effects for Analyzing Mixed Longitudinal Responses

The analysis of discrete mixed responses is an important statistical issue in various sciences. Ordinal and overdispersed binomial variables are discrete. Overdispersed binomial data are a sum of correlated Bernoulli experiments with equal success probabilities. In this paper, a joint model with random effects is proposed for analyzing mixed overdispersed binomial and ordinal longitudinal respo...

متن کامل

ggRandomForests: Survival with Random Forests

Random Forests (Breiman 2001) (RF) are a fully non-parametric statistical method requiring no distributional assumptions on covariate relation to the response. RF are a robust, nonlinear technique that optimizes predictive accuracy by fitting an ensemble of trees to stabilize model estimates. Random Forests for survival (Ishwaran and Kogalur 2007; Ishwaran, Kogalur, Blackstone, and Lauer 2008) ...

متن کامل

Zeileis Danger : High Power ! – Exploring the Statistical Properties of a Test for Random Forest Variable

Random forests have become a widely-used predictive model in many scientific disciplines within the past few years. Additionally, they are increasingly popular for assessing variable importance, e.g., in genetics and bioinformatics. We highlight both advantages and limitations of different variable importance scores and associated testing procedures, especially in the context of correlated pred...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006